Pádraig Cunningham, Marc van Dongen,

نویسندگان

  • Pádraig Cunningham
  • Tim Fernando
  • Carl Vogel
  • Barry O'Sullivan
چکیده

In this paper, we present a novel system which automat-ically converts text documents into XML by using machine-learningtechniques. In the first phase, the system uses the Self-OrganizingMap (SOM) algorithm to arrange marked-up documents on a two-dimensional map such that the documents similar in content appearcloser to each other. In the second phase, it then uses the inductivelearning algorithm C5.0 to automatically extract and apply markupinformation (in the form of rules) from the nearest SOM neighboursof an unmarked document. The system is designed to have an adap-tive behaviour, so that once a document is marked-up into XML, itlearns from its errors to improve accuracy. The resulting marked-updocument is again categorized on the SOM. The results of our ex-periments with a number of document sets from different domains,indicate that our approach is practical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network of Excellence Multimedia Understanding through Semantics, Computation and LEarning

We present an automatic focus area estimation method, working with a single image without a priori information about the image, the camera, or the scene. It produces relative focus maps by localized blind deconvolution and a new residual error-based classification. Evaluation and comparison is performed and applicability is shown through image indexing.

متن کامل

FIONN: A Framework for Developing CBR Systems

Case-Based Reasoning (CBR) is a very popular methodology for developing knowledge-based systems [1]. Yet there are few toolkits available for building CBR systems. In this paper we present a framework called Fionn that is specifically designed for the development of CBR systems. Since Fionn was designed specifically for CBR it provides good support for some of the unique characteristics of CBR ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003